Journal of Proteome Research
● American Chemical Society (ACS)
Preprints posted in the last 90 days, ranked by how well they match Journal of Proteome Research's content profile, based on 215 papers previously published here. The average preprint has a 0.14% match score for this journal, so anything above that is already an above-average fit.
Gronning, A. G. B.; Scheele, C.
Show abstract
The analysis of bulk omics data, such as RNA-seq and proteomics, has enabled numerous biological discoveries. Standard analytical workflows typically comprise dimensionality reduction, group-wise statistical comparisons, functional enrichment analysis, and mapping of molecules to biological networks. Although informative, these steps are often applied independently, limiting integrative interpretation and the efficient identification of functional drivers and candidate targets. To address these limitations, we developed BioTrendFinder, an interactive web tool for exploring functional drivers in gene- and protein-level bulk omics data. BioTrendFinder employs a sample-ranking strategy to identify significant molecular trendlines that capture expression patterns across ranked sample compositions in dimensionally reduced data. These trends are integrated with statistical results, sample-group metadata and functional information from STRING and eleven bio-ontologies, enabling interactive network-based exploration and the generation of entity-ranked functional modules. BioTrendFinders unique approach and functionalities add additional analytical dimensions to bulk omics data by facilitating the extraction of high-level information from alternative analytical perspectives. Using previously published proteomics and transcriptomics datasets, we demonstrate that BioTrendFinder supports both hypothesis-driven and exploratory investigations, enabling the prioritization of candidate molecular targets and effectively narrowing the search space for downstream validation steps.
Buur, L. M.; Winkler, S.; Dorfer, V.
Show abstract
Open modification search (OMS) strategies have gained popularity in mass spectrometry-based proteomics for identification of peptides carrying unknown or unexpected post-translational modifications. However, most OMS search engines report only the overall mass difference between the precursor and the matched peptide and do not explicitly identify or score combinations of multiple modifications at the peptide-spectrum match (PSM) level, leaving the interpretation of mass shifts up to the end user and to using downstream analysis tools. Here, we introduce MS Andrea, a novel OMS search engine developed to directly identify and score combinations of up to four variable modifications per peptide without having to predefine them. MS Andrea uses a sequence tag-based strategy to efficiently filter candidate peptides prior to scoring. Remaining candidates are evaluated using the MS Amanda scoring function, first considering fixed modifications only, followed by a second scoring stage in which combinations of modifications from the Unimod database are considered based on the observed mass difference and matched to the spectrum. We evaluated MS Andrea using phosphopeptide datasets from HeLa cells and Arabidopsis thaliana and compared its performance with the widely used OMS engines MSFragger and Sage. Across datasets, MS Andrea identified the highest number of PSMs at 1% false discovery rate while achieving comparable peptide-level identifications. Importantly, MS Andrea directly reports modification identities and sites at the PSM level and enables the identification of peptides having up to four variable modifications. Together, these results demonstrate that MS Andrea facilitates more detailed and interpretable characterization of peptide modifications while maintaining competitive identification performance in OMS-based proteomic analyses. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=132 SRC="FIGDIR/small/714851v1_ufig1.gif" ALT="Figure 1"> View larger version (19K): org.highwire.dtl.DTLVardef@52f65forg.highwire.dtl.DTLVardef@acf4e3org.highwire.dtl.DTLVardef@10171caorg.highwire.dtl.DTLVardef@1d594ad_HPS_FORMAT_FIGEXP M_FIG C_FIG
Moagi, M. G.; Thatiana, F. F.; Kristof, E. K.; Arda, A. G.; Arianti, R.; Horvatovich, P.; Csosz, E.
Show abstract
Liquid chromatography-tandem mass spectrometry (LC-MS/MS) based proteomics, particularly data-independent acquisition (DIA), has become widely adopted across in One Health approaches for biological and clinical research for quantitative protein characterization. Among the many computational tools available, DIA-NN has demonstrated superior performance; however, the primary output of the current versions is provided as a compact, compressed PARQUET file that can be difficult to interrogate without programming expertise. To address this limitation, we developed DIA-NN EasyFilter (DEF), a fast, user-friendly, KNIME-based workflow for comprehensive protein filtering, and visualization. DEF integrates chromatographic peak-based filtering, curated contaminant libraries, and quantity-quality assessment, along with interactive modules for qualitative and quantitative data exploration. The workflow is optimized for efficient execution within the KNIME local desktop environment and is designed to support end-users in improving accuracy and interpretability without requiring coding skills. We provide detailed description on how to run DEF and demonstrate the utility and robustness of DEF using published large-scale proteomics datasets, showing high comparability across studies regardless of instrument platform or dataset size. Table of Contents graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=194 SRC="FIGDIR/small/710308v1_ufig1.gif" ALT="Figure 1"> View larger version (35K): org.highwire.dtl.DTLVardef@ce9f1dorg.highwire.dtl.DTLVardef@13042faorg.highwire.dtl.DTLVardef@17d3907org.highwire.dtl.DTLVardef@2b3aee_HPS_FORMAT_FIGEXP M_FIG C_FIG
Parmar, B.; Liu, Y.; Ghezellou, P.; Muench, C.
Show abstract
Advances in ultra-fast mass analyzer technology and procedural automation have enabled proteomics screening at the throughput of hundreds of proteomes per day. However, these approaches often require expensive instrumentation upgrades and robotic automation that remain inaccessible to many research laboratories and core facilities. In this study we address the feasibility of scaling up proteomic screening capabilities with minimal upgrade cost by focusing on (a) strategies for non-automated high-throughput sample preparation from 96-well cell culture, (b) data acquisition on sub-50Hz scan speed hybrid and tribrid Orbitrap instruments and (c) data analysis strategies for label-free and labeled proteomic screening. We find that the 96-well format STrap, in combination with C18 plates, provides the most robust throughput for a non-automated sample preparation workflow. Furthermore, we show that for static proteomes, an isobaric tandem mass tag-based (TMT) multiplexing approach provides deeper and more precise proteome coverage whereas label-free data-independent acquisition (DIA) is more accurate, albeit with a reduced dynamic range and more missing values. Finally, we extend the optimized workflow to proteome turnover studies using pulsed stable isotope labeling by amino acids in cell culture (pSILAC), highlighting the key advantages and trade-offs of DIA and TMT data-dependent acquisition strategies for capturing protein translation. Together, these results provide a practical framework for designing high-throughput proteomics experiments that balance throughput, depth, and quantitative accuracy using existing instrumentation, without requiring major hardware upgrades or automation.
Bai, F.; Wu, Z.; Xing, S.; Fu, Q.
Show abstract
Accurate biological sex determination of ancient remains is critical for archaeological, anthropological, and forensic studies, but remains challenging for morphologically ambiguous and highly degraded endogenous DNA samples. Paleo-proteomics sex identification approaches, targeting sexually dimorphic amelogenin isoforms (AMELX and AMELY), present a promising solution. However, current workflows rely on manual verification of a few specific peptide markers, a process that lacks standardization and is susceptible to false-positive AMELY signals. To overcome these limitations, we developed protSexInferer, a lightweight, open-source bioinformatic pipeline for automated sex estimation from paleo-proteomic data. Our method uses the ratio of AMELY-specific peptides to all detected AMELY- and AMELX-specific peptides (i.e., the RAMELY value) rather than the mere presence or absence of AMELY signals for sex classification. We demonstrated that the RAMELY value clearly distinguishes male and female individuals in both reference and independent validation datasets, enabling reliable sex assignments even in cases where conventional intensity-based comparisons (e.g., AMELY-59M vs. AMELX-60) are ambiguous. This ratio-based approach effectively mitigates the impact of false-positive AMELY signals, therefore eliminating the need for time-consuming manual verification, and remains reliable even for samples with low peptide yields. Equipped with pre-constructed protein reference databases, protSexInferer provides a robust, standardized, and end-to-end solution for paleo-proteomic sex determination.
Ambrose, E. A.; Kandasamy, G.; Meulener, M. M.; Zhang, F.
Show abstract
Many proteomics protocols rely on enzymatic digestion of complex protein mixtures to generate peptides with predictable cleavage patterns for the mass spectrometry analysis. One of the most utilized enzymes, trypsin, is classically defined as a serine endopeptidase with high specificity for cleaving peptide bonds on the C-terminal side of internal lysine and arginine residues. Accordingly, trypsin is not expected to remove the N-terminal arginine, which may arise through posttranslational modification such as arginylation or by proteolysis exposing internal residues as the new N-termini. N-terminal arginine plays important biological roles, including functioning as an N-degron and modulating protein interactions/signaling through its positive charge. Curiously, prior mass spectrometry-based studies utilizing trypsin to identify proteins bearing N-terminal arginine have frequently reported low and inconsistent yields, suggesting potential systematic bias in current proteomic approaches. Here, we explored whether trypsin would affect the integrity of the N-terminal arginine. By using antibodies specifically recognizing N-terminal arginine of different peptides, and by using mass spectrometry peptide analysis, we show that trypsin can remove N-terminal arginine residues in an exopeptidase-like manner. This effect occurs across a range of digestion conditions consistent with standard proteomic workflows, on peptides or whole proteins, and depends on trypsin concentration, incubation time, and catalytic activity. In addition, we show that the alternative arginine-cleavage enzyme Arg-C can also affect N-terminal arginine in a sequence-dependent context. In contrast, Lys-C and LysargiNase do not exhibit such effects, providing suitable alternative digestion strategies. Together, these findings reveal an unappreciated enzymatic behavior of arginine-cleaving proteases and suggest that their widespread use may systematically compromise the detection of N-terminal arginine in proteomic studies.
Dunlop, F. M.; Mason, S.; Hafizi Rastabi, N.; Alexander, S. E.; Robatjazi, S.; Davis, J.; Laird, C.; Kang, T.; Mathivanan, S. E.; Russell, A. P.
Show abstract
Extracellular vesicles (EVs) are promising biomarkers, yet their proteomic analysis from plasma is hampered by low abundance and co-purification of contaminants (e.g., lipoproteins, platelets) and technical variability, particularly in small-volume animal models. We developed and validated a modular protocol integrating Size Exclusion Chromatography (SEC) with Strong Anion Exchange (SEC-SAX) specifically tailored for quantitative LC-MS proteomics from small starting volumes (150 l of plasma). SEC alone successfully removed 99% of Albumin, and the SAX step significantly enriched EVs over contaminating lipoproteins. Downstream single pot solid phase enhanced (SP3) sample prep and STAGE tip solid phase extraction ensured maximum proteome depth. Critical confounding factors were objectively assessed: Platelet Factor 4 (PF4) was confirmed as a highly sensitive platelet marker, confirming the necessity of meticulous plasma preparation. Sample hemolysis impacted the plasma EV proteome data. As such, an objective measure (nanodrop spectrophotometer) of hemolysis and exclusion of hemolysed samples (heme >0.3 mg/ml) is recommended. The protocol is applicable to both human and mouse plasma as demonstrated by EV enrichment and quantification of biomarker proteins associated with neurodegenerative diseases from eight individual mouse plasma samples. Manuscript HighlightsO_LIDevelopmental workflow for a quantitative SEC-SAX protocol for EV proteomics from small plasma volumes (150 l). C_LIO_LIA range of variables tested including SAX beads amount, digestion buffer, digestion time, STAGE tip solid phase extraction, SAX elution buffer and sample filtration. C_LIO_LIThe SAX step significantly enhances EV proteome depth by increasing EV purity in relation to ApoB lipoproteins. C_LIO_LIShows the impact of the major confounding factors of sample hemolysis and platelet contamination on the EV proteome. C_LIO_LIPlatelet contamination increases the number and abundance of proteins detected including known disease biomarkers and sample hemolysis is associated with proteins derived from platelet and red blood cell derived EVs. C_LIO_LIPlatelet Factor 4 (PF4) is identified and confirmed as a sensitive marker for platelet contamination. C_LIO_LIApplicable to both human and mouse plasma. C_LI
Paradeisi, F.; Gonidaki, C.; Tserga, A.; Courraud, J.; Bakouros, P.; Karousi, P.; Kostopoulos, I. V.; Margelos, T.; Goula, E.; Stegehuis, C.; Meylahn, J. M.; Martzakli, A.; Liacos, C. I.; Dimopoulos, M. A.; Tsitsilonis, O.; Vlahou, A.; Zoidakis, J.; Kastritis, E.
Show abstract
Background: Multiple myeloma (MM) remains incurable despite therapeutic advances, reflecting limited understanding of the molecular mechanisms underlying disease initiation and progression. MM develops through asymptomatic precursor stages, monoclonal gammopathy of undetermined significance (MGUS) and smouldering multiple myeloma (SMM). This study aimed to investigate protein changes associated with disease progression and, through a further integrative approach, to highlight molecular changes of potential predictive and/or therapeutic value. Methods: We performed a comparative proteomic analysis of 94 bone marrow-derived CD138+-selected plasma cell samples (29 MGUS, 20 SMM, and 45 MM) using LC-MS/MS. Differential protein abundance was assessed using pairwise Mann-Whitney U tests between groups, with Benjamini-Hochberg correction. Pathway enrichment, protein-protein interaction, and co-expression network analyses were also conducted. Selected proteins were further evaluated using public transcriptomic datasets and experimentally validated in independent samples by flow cytometry and enzyme-linked immunosorbent assay (ELISA). Results: Following data processing, proteomic analysis identified 6,203 proteins. Pairwise comparisons revealed significant proteomic differences across disease stages, with 370 differentially abundant proteins exhibiting monotonic changes during disease progression. Pathway analysis showed that monotonically upregulated proteins were mainly associated with gene expression and cell proliferation, whereas downregulated proteins were linked to immune-related processes. Further co-expression network analysis, combined with criteria including detection frequency, biological relevance, and translational potential, highlighted a group of prioritised proteins. Representative examples include nucleolin (NCL) and U3 small nucleolar ribonucleoprotein IMP3 (IMP3), involved in nucleolar organisation, ribosome biogenesis and rRNA processing, as well as the immune-associated lactotransferrin (LTF) and serine protease cathepsin G (CTSG). Transcriptomic support and independent experimental validation by flow cytometry and ELISA confirmed the relevance of selected candidates. Conclusions: Taken together, our findings highlight coordinated changes in immune regulation, RNA processing and ribosome biogenesis during MM progression and identify candidate proteins and their networks, including the emerging pharmacologically tractable target NCL and the underexplored IMP3 of potential therapeutic relevance, opening new avenues for further investigation.
Nishizaki, M.; Araki, N.; Kawano, S.
Show abstract
MotivationThe rapid expansion of proteomic data has created new opportunities for large-scale integrative analyses. However, substantial variability across platforms, experimental designs, and processing pipelines limits direct quantitative comparisons among studies. Differential proteomic changes between conditions are often considered to be more reproducible than absolute abundances and may therefore provide a robust basis for cross-dataset integration. However, the systematic ability of differential change-based approaches to capture biologically meaningful relationships across heterogeneous datasets remains unclear. ResultsWe developed a differential-change framework and applied it to public proteomic datasets. Pairwise contrasts were defined as differential proteomic profiles, and the concordance of up- and down-regulated proteins was quantified using odds ratios. Significant profile pairs were visualized as an integrative network. The treatment of anti-cancer drug doxorubicin vs control (MCF-7) comparison emerged as a central hub, with breast cancer proteome profiles clustering around it and associating with tumor stage (p = 0.03). Enrichment analysis revealed overrepresentation of lipid- and cholesterol-related pathways. Availability and implementationThe source code for proteome network integration is available at https://github.com/manakanishizaki/proteome-network-integration.git.
Wen, B.; Paez, J. S.; Hsu, C.; Canzani, D.; Chang, A. T.; Shulman, N.; MacLean, B. X.; Berg, M. D.; Villen, J.; Fondrie, W.; Pino, L.; MacCoss, M. J.; Noble, W. S.
Show abstract
Data-independent acquisition (DIA) proteomics enables reproducible and systematic peptide detection and quantification, and trapped ion mobility spectrometry (TIMS) on the timsTOF platform further improves DIA by synchronizing ion mobility separation with quadrupole precursor sampling. Analyzing the highly multiplexed spectra generated by DIA typically relies on spectral libraries, and fully leveraging the additional ion mobility dimension requires these libraries to include accurate retention time, fragment ion intensity, and ion mobility annotations. Existing in silico spectral library generation tools either lack ion mobility support entirely or rely on models trained on data-dependent acquisition (DDA) data, that can introduce a mismatch that may not capture unique experiment-specific biases when applied to each respective timsTOF dataset. Carafe is a software tool that uses deep learning models to generate high-quality, experiment-specific in silico libraries by training directly on DIA data. In this study, we extend Carafe to generate libraries for timsTOF DIA data, which involves fine-tuning retention time (RT), fragment ion intensity, and ion mobility prediction models using timsTOF DIA data. Carafe2 operates directly on native timsTOF raw data (Bruker .d directories) without the need for data conversion. We demonstrate the performance of Carafe2 across a wide range of DIA applications, including global proteome, phosphoproteome, and plasma proteome datasets. Comparing Carafe2 fine-tuned RT, fragment ion intensity, and ion mobility prediction models with pretrained DDA models, we find that Carafe2 models outperform pretrained models on a variety of DIA datasets. We then demonstrate the utility of in silico libraries generated by Carafe2 for peptide detection on several different types of timsTOF DIA datasets by comparing with the libraries generated with DDA-trained AlphaPeptDeep models, DIA-NN built-in models, and empirical spectral libraries generated from DDA experiments.
Argentini, A.; Fernandez Fernandez, E.; Pauwels, J.; Gevaert, K.
Show abstract
SummaryData-independent acquisition (DIA) has become the preferred data acquisition method for mass spectrometry-based proteomics, yet, reproducible workflows for differential expression (DE) analysis and reporting results remain limited. We present DiaReport, an R package that performs precursor- and protein-level DE analysis from DIA-NN output using MSqRob and QFeatures, while generating high-quality, interactive HTML reports through Quarto. DiaReport integrates precursor data, filtering of missing values, normalization, protein summarization and statistical modeling within a single function, supporting both simple pairwise as well as complex experimental designs. The package provides structured outputs and configuration files to ensure computational reproducibility across different studies. To accommodate diverse research needs, DiaReport includes multiple reporting templates tailored to different proteomic applications. Applying DiaReport to an extracellular vesicle (EV) proteomics dataset demonstrates its ability to efficiently analyze DIA data and provide rapid insights into sample quality and protein level differences. Availability and ImplementationDiaReport is an open-source R package available at https://github.com/Gevaert-Lab/diareport. The package is platform-independent and distributed under the MIT license. Reports are generated using Quarto and require only standard R dependencies. Detailed documentation, installation guides and usage vignettes are provided within the repository. The interactive HTML reports discussed in this study, including the UPS2 benchmark and EV case study, are archived on Zenodo (https://doi.org/10.5281/zenodo.18632744 and https://doi.org/10.5281/zenodo.18632731). ContactCorresponding author: Kris Gevaert; kris.gevaert@vib-ugent.be.
Dupas, A.; Ibranosyan, M.; Ginevra, C.; Jarraud, S.; Lemoine, J.
Show abstract
Understanding allelic variability is crucial for elucidating intrinsic bacterial mechanisms and distinguishing phenotypic profiles. However, such variability poses a major challenge for the reliable identification of proteins in data-independent acquisition (DIA) proteomics. To address this, we developed an analytical workflow that integrates protein sequence variability to enhance proteome coverage. Fifteen Legionella pneumophila isolates were analyzed using DIA-NN, with spectral libraries generated either from a reference proteome or incorporating allelic variability. Our workflow includes protein clustering and subsequent protein inference from these clusters, allowing the accurate assignment of shared and variant-specific peptides. Integration of variability enabled the identification of a comparable number of proteins as the reference proteome while capturing between 28 and 77 % of variant-specific sequences in each isolate, all while maintaining a low false positive rate. These findings demonstrate that accounting for allelic variability substantially improves proteomic coverage and identification confidence, providing a more comprehensive view of the proteome. This approach facilitates a deeper understanding of biological mechanisms and enables precise bacterial proteotyping of Legionella pneumophila isolates.
Valdes-Tresanco, M. E.; Wacker, S.; Valdes-Tresanco, M. S.; Plakhotnyk, A.; Brodie, N. I.; Hepburn, M.; Ulke-Lemee, A.; Huttlin, E. L.; Lewis, I. A.
Show abstract
Over the past years, proteomics has moved increasingly towards the analysis of large cohorts of biological specimens. This has been made possible by significant improvements in mass spectrometry technology, chromatographic separation methods, and improved data acquisition strategies. These technological advances now routinely enable experiments that yield vast datasets that substantially outstrip the capacity of existing proteomics data analysis approaches. Processing such large datasets requires purpose-built, quality control tools designed to organize and analyze the data while recording all processing parameters for reproducibility. To address this need, we developed an open-source, Python-based software platform, Large-scale Automated Multi-level Proteomics Evaluation by Python (LAMPrEY), a comprehensive quality-control pipeline for quantitative proteomics analyses of large cohorts of samples. LAMPrEY features GUI-based file submission, automated processing with MaxQuant and RawTools, an interactive analytics dashboard, and an application programming interface (API) for programmatic usage that collectively enable rapid, reproducible analysis and interpretation of proteomics data. We demonstrate the longitudinal monitoring and analytical capabilities of LAMPrEY using TMT11 quantitative proteomics data generated from 910 Enterococcus faecium isolates collected from bloodstream infection patients. LAMPrEY is an open-source software that can be accessed at www.lewisresearchgroup.org/software.
Antony, F.; Bhattacharya, A.; Duong van Hoa, F.
Show abstract
Peptergent is a novel class of amphipathic peptides that enable detergent-free extraction and purification of membrane proteins (MPs). These designed peptides self-assemble around hydrophobic transmembrane regions of proteins, forming stable, water-soluble assemblies that can be isolated directly from biological membranes. By doing so, Peptergent bypass the limitations imposed by traditional detergents, which often destabilize proteins and restrict downstream analyses. Since detergents are completely avoided, Peptergent-isolated MPs are directly amenable to structural and mass spectrometry (MS) analysis, thereby addressing their persistent underrepresentation in proteomic datasets and improving their accessibility for drug-screening strategies. Here, we describe a streamlined protocol for isolating MPs with the Peptergent PDET-1, followed by exchange into His-tagged Peptidiscs for Ni-NTA-based affinity purification. The method comprises membrane isolation, peptide preparation, protein extraction, clarification, and exchange of MPs from Peptergent to Peptidiscs. Application of this workflow yields enriched membrane proteomes compatible with downstream LC-MS/MS analysis, with improved recovery of hydrophobic and multi-pass membrane proteins. Key featuresO_LIDirect extraction and solubilization of membrane proteins in Peptergents C_LIO_LIExchange into His-tagged Peptidiscs enabling affinity purification of MPs C_LIO_LI100% detergent-free workflow compatible with LC-MS/MS analysis C_LIO_LIApplicable to cultured cells and tissue-derived membrane fractions C_LI In BriefWe describe a Peptergent-based workflow for isolating membrane proteins directly from membrane preparations. Proteins are extracted with the Peptergent peptide scaffold (PDET-1) and transferred into His-tagged Peptidisc (HD-43). The water-soluble membrane proteins are enriched by Ni-NTA affinity purification and prepared for bottom-up mass spectrometry, yielding enriched membrane proteomes and dried peptide samples ready for LC-MS analysis Graphical Overview O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=128 SRC="FIGDIR/small/711971v1_ufig1.gif" ALT="Figure 1"> View larger version (36K): org.highwire.dtl.DTLVardef@af3241org.highwire.dtl.DTLVardef@c6a94org.highwire.dtl.DTLVardef@129322aorg.highwire.dtl.DTLVardef@19c7c9d_HPS_FORMAT_FIGEXP M_FIG C_FIG
Fichtner, I. D.; Temesvari-Nagy, L.; Sahm, F.; Gerstung, M.; Bludau, I.
Show abstract
SummaryProteoPy is a lightweight Python library for protein- and peptide-level quantitative proteomics analysis, built around the AnnData class as its core data structure. It streamlines data import, preprocessing, and differential analysis while preserving all metadata within a single object. A reimplementation of our previously published COPF algorithm enables proteoform group inference directly from peptide-level data, facilitating the identification of proteoform-specific regulation and isoform usage. Designed for accessibility and flexibility, ProteoPy simplifies analysis for non-specialists and provides an extensible foundation for advanced proteomics workflows, seamlessly integrating with the scanpy and muon ecosystems for reproducible and scalable multi-omics analysis. Availability and implementationProteoPy is implemented in Python 3 and publicly available on GitHub: https://github.com/UKHD-NP/proteopy under the Apache 2.0 license. Contactisabell.bludau@med.uni-heidelberg.de Supplementary informationTutorial notebooks for ProteoPy are included as supplementary data and are also available on GitHub: https://github.com/UKHD-NP/proteopy/tree/main/docs/tutorials.
Winkelhardt, D.; Berres, S.; Uszkoreit, J.
Show abstract
Peptide-spectrum match (PSM) rescoring has become standard in proteomics workflows, improving peptide identification accuracy across diverse search engines. Despite the availability of multiple rescoring strategies, systematic comparisons spanning several search engines, datasets, and database configurations remain limited. Here, we benchmarked seven publicly available search engines, evaluating standard target-decoy-based false discovery rate (FDR) estimation alongside Percolator, MS2Rescore, and Oktoberfest across four datasets acquired on different mass spectrometry platforms and searched against protein databases of varying size and composition. Rescoring substantially increased identification consensus and reduced variability between search engines, with prediction-based approaches yielding the largest gains. While database size had limited impact for human datasets, it significantly affected identification rates on a metaproteomic dataset. Entrapment-based evaluation indicated generally adequate FDR control across methods, although prediction-based rescoring exhibited a slightly higher tendency toward FDR underestimation in specific configurations. Overall, advanced rescoring strategies harmonize peptide identification outcomes across search engines, thereby enhancing robustness and comparability in proteomics analyses. However, careful feature selection and appropriate database choice remain essential to ensure reliable FDR control and optimal performance across diverse experimental settings.
Mukonyora, M.
Show abstract
1.1Hair has applications in biomarker discovery and forensics, yet the influence of proteomics software tools on hair proteome characterisation remains underexplored. This study compares four bottom-up proteomics workflows (MaxQuant, FragPipe, MetaMorpheus, and SearchGUI/PeptideShaker). Publicly available hair proteomes were analysed following extraction with 1-dodecyl-3-methylimidazolium chloride (DMC), sodium dodecanoate (SDD), sodium dodecyl sulfate (SDS), and urea. Data were acquired on Orbitrap-based DDA platforms. Peptide identification, protein inference, functional annotation, physicochemical properties, and label-free quantification (LFQ) were evaluated. Peptide-level performance differed across tools. MS-GF+ and FragPipe identified the most unique peptides, while X!Tandem reported the fewest. Protein inference showed a dissociation from peptide-level results. MetaMorpheus reported the highest number of protein groups despite only the third highest peptide counts. FragPipe and MaxQuant followed, while PeptideShaker consistently inferred the fewest proteins. Protein-level concordance was low, with only 30.3% overlap across tools and extraction methods. These differences extended to downstream analyses. Functional enrichment showed moderate concordance (38.25% overlap). Physicochemical profiles varied, with MetaMorpheus identifying more hydrophobic proteomes and PeptideShaker more hydrophilic profiles. At the quantitative level, reproducibility depended on extraction buffer. SDS and urea showed lower variability (CV =< 0.025), while DMC and SDD showed higher variability (up to 0.10). Absolute LFQ intensities and differential expression outputs varied across tools despite moderate to strong correlation (r = 0.77 to 0.93). Overall, software choice influences proteome coverage, physicochemical profiles, and quantitative outcomes. Relative trends were partially conserved, but magnitude and significance varied. These findings support careful method selection and multi-tool validation in hair proteomics
Schramm, T.; Gillet, L.; Reber, V.; de Souza, N.; Gstaiger, M.; Picotti, P.
Show abstract
Peptide-level analyses are becoming increasingly popular in mass spectrometry-based proteomics and are being applied, for example, in immunopeptidomics, structural proteomics, and analyses of post-translational modifications. In such analyses, peptides that are not biologically meaningful but instead arise as artifacts prior to mass spectrometry analysis pose the risk of data misinterpretation. Here, we describe an approach based on retention time analysis and precise chromatographic peak matching to identify peptides generated by in-source fragmentation (ISF), which occurs between chromatographic separation of peptide mixtures and the first mass filter of a tandem mass spectrometer (MS). To understand the prevalence and properties of ISF, we generated 13 proteomics datasets and analyzed them along with additional 25 previously published datasets spanning a broad range of sample types, MS, and proteomics approaches including classical bottom-up proteomics, immunopeptidomics, structural proteomics, and phosphoproteomics. We found that, in typical trypsin-digested samples on average 1 % of fully-tryptic peptides and 22 % of semi-tryptic peptides originated from ISF. However, we observed large variations between datasets, and in-source fragments exceeded, in some cases, a third of the total peptide identifications. The extent of ISF was dependent on the peptide sequence, the instrument, method parameters, and sample complexity. Although ISF did not impair relative quantification across samples, it generated peptides that could be misinterpreted qualitatively, inflated peptide identifications, and comprised up to 37 percent of peptides shorter than 9 amino acids in immunopeptidomics datasets. We propose that, for peptide-centric applications, our open-source ISF detection approach be used to re-annotate peptides generated by ISF and remove them to avoid misinterpretation of data. ISF is an increasing concern with improving mass spectrometers, as they enable detection of an ever-increasing number of m/z features, including low abundance features like ISF products. Our work thus addresses a growing issue in proteomics and presents solutions to mitigate the impact of in-source fragment peptides. In the future, improved feature detection algorithms may enable elucidation of new ISF patterns affecting side chains that have been missed so far, which could contribute to explaining the vast space of as-yet unannotated proteomics data.
Cain, S. A.; Fatima, M.; Humphries, M.
Show abstract
Manchester Proteome Profiler (MPP) is an open-source R Shiny application that streamlines downstream analysis of quantitative proteomic data. Compatible with grouped protein intensities tables from MaxQuant, FragPipe, Proteome Discoverer and other custom layouts, MPP provides an integrated platform for filtering, normalisation, imputation, differential expression analysis and cluster analysis across user-chosen experimental conditions. MPP supports both single- and dual-dataset comparisons, incorporates SAINTexpress for affinity purification and proximity labelling experiments, and downstream analysis of the significant protein list clusters to functional enrichment and interaction networks via Gene Ontology, BioGRID and STRING. Benchmarking with a KRAS proximity biotinylation dataset demonstrated the ability of MPP to identify reproducible clusters of differentially expressed proteins and reveal biologically meaningful patterns, including enrichment of solute carrier transporters and adhesion molecules. With interactive visualisations, customisable reports, and support for complex experimental designs, MPP offers a novel, versatile and user-friendly environment for proteomic data exploration and hypothesis generation.
Van Leene, C.; Araftpoor, E.; Gevaert, K.
Show abstract
Limited proteolysis coupled to mass spectrometry (LiP-MS) is a peptide-centric conformational proteomics approach during which a brief incubation with a non-specific protease (e.g., proteinase K) under native conditions generates structural fingerprints that report on treatment-induced conformational changes, which is followed by a tryptic digest under denaturing conditions allowing to read out these fingerprints 1. In contrast, the recently introduced peptide-centric local stability assay (PELSA) uses a high trypsin-to-substrate ratio under native conditions to release fully tryptic peptides that reflect structural stability upon ligand binding 2. In their paper, Li et al. compared PELSA and LiP-MS across several benchmarks and reported that PELSA exhibited quantitative sensitivity comparable to or exceeding LiP-MS. Notably, PELSA quantified a 21-fold greater rapamycin-induced change for FKBP1A compared to LiP-MS. Because such claims influence method selection for conformational proteomics, we reanalyzed the publicly deposited datasets underlying these comparisons and assessed the experimental and analytical choices that contributed to the reported effect sizes. Our evaluation indicates that the reported 21-fold difference arises from non-matched experimental conditions and undisclosed data imputation, and that conclusions regarding quantitative superiority or biological interpretability should therefore be treated with caution.